Search CORE

123 research outputs found

Design and Evaluation of a Scalable Engine for 3D-FFT Computation in an FPGA Cluster

Author: Ammendola Roberto
Loreti Pierpaolo
Publication venue: 'Insight Society'
Publication date: 05/04/2019
Field of study

The Three Dimensional Fast Fourier Transform (3D-FFT) is commonly used to solve the partial differential equations describing the system evolution in several physical phenomena, such as the motion of viscous fluids described by the Navier–Stokes equations. Simulation of such problems requires the use of a parallel High-Performance Computing architecture since the size of the problem grows with the cube of the FFT size, and the representation of the single point comprises several double precision floating- point complex numbers. Modern High-Performance Computing (HPC) systems are considering the inclusion of FPGAs as components of this computing architecture because they can combine effective hardware acceleration capabilities and dedicated communication facilities. Furthermore, the network topology can be optimized for the specific calculation that the cluster must perform, especially in the case of algorithms limited by the data exchange delay between the processors. In this paper, we explore an HPC design that uses FPGA accelerators to compute the 3DFFT. We devise a scalable FFT engine based on a custom radix-2 double-precision core that is used to implement the Decimation in Frequency version of the Cooley–Tukey FFT algorithm. The FFT engine can be adapted to different technology constraints and networking topologies by adjusting the number of cores and configuration parameters in order to minimize the overall calculation time. We compare the various possible configurations with the technological limits of available hardware. Finally, we evaluate the bandwidth required for continuous FFT execution in the APEnet toroidal mesh network.

International Journal on Advanced Science, Engineering and Information Technology

HYDRODYNAMICS OF UNCONVENTIONAL FLUIDIZED BEDS: SOLIDS FLOW PATTERNS AND THEIR INFLUENCE ON MIXING/SEGREGATION OF A LARGE FLOTSAM PARTICLE IN A BED OF FINER SOLIDS

Author: Chirone Riccardo
Giovanna Ruoppolo Paola Ammendola,
Solimene Roberto
Publication venue: ECI Digital Archives
Publication date: 16/05/2010
Field of study

Gross solids circulation of solid phase and its influence on mixing/segregation of a large flotsam particle in beds of finer solids in unconventional fluidized beds has been investigated. A tapered two-dimensional fluidization column and a fluidization column equipped with a diverging cone as gas distributor have been adopted. The hydrodynamics of the gas-solid suspension in the two apparatus has been qualitatively assessed by visual observation and the trajectories of the centre-of-gravity of large flotsam particles have been evaluated to assess the extent of mixing/segregation

Engineering Conferences International

Design and Evaluation of a Scalable Engine for 3D-FFT Computation in an FPGA Cluster

Author: Pierpaolo Loreti
Roberto Ammendola
Publication venue
Publication date: 05/04/2019
Field of study

Open Access Repository

High-Performance, Low-Complexity Deadlock Avoidance for Arbitrary Topologies/Routings

Author: Ammendola Roberto
Camarero Cristóbal
Concatto Caroline
Singla Ankit
Skeie Tor
Publication venue
Publication date: 01/01/2018
Field of study

Recently, the use of graph-based network topologies has been proposed as an alternative to traditional networks such as tori or fat-trees due to their very good topological characteristics. However they pose practical implementation challenges such as the lack of deadlock avoidance strategies. Previous proposals are either exceedingly complex, underutilise network resources or lack flexibility. We propose- and prove formally- three generic, low-complexity dead-lock avoidance mechanisms that only require local information. The main strengths of our method are its topology- and routing- independence and that the virtual channel count is bounded by the length of the longest path. We evaluate our proposed mechanisms against previous proposals through an extensive simulation study to measure the impact on the performance using both synthetic and realistic traffic. First we compare against a well-known HPC mechanism for dragonfly and achieved similar performance level. Then we moved to Graph-based networks and show that our mechanisms can greatly outperform traditional, spanning-tree based mechanisms, even if these use a much larger number of virtual channels. Overall, we find that our proposal provides a simple, flexible and high performance deadlock-avoidance solution

Crossref

ZENODO

NEUROSURGERY ENTHUSIASTIC WOMEN SOCIETY

The University of Manchester - Institutional Repository

APEnet+: a 3D toroidal network enabling Petaflops scale Lattice QCD simulations on commodity clusters

Author: Ammendola Roberto
Biagioni Andrea
Cicero Francesca Lo
Frezza Ottorino
Lonardo Alessandro
Paolucci Pier
Petronzio Roberto
Rossetti Davide
Salamon Andrea
Salina Gaetano
Simula Francesco
Tantalo Nazario
Tosoratto Laura
Vicini Piero
Publication venue
Publication date: 01/01/2010
Field of study

Many scientific computations need multi-node parallelism for matching up both space (memory) and time (speed) ever-increasing requirements. The use of GPUs as accelerators introduces yet another level of complexity for the programmer and may potentially result in large overheads due to the complex memory hierarchy. Additionally, top-notch problems may easily employ more than a Petaflops of sustained computing power, requiring thousands of GPUs orchestrated with some parallel programming model. Here we describe APEnet+, the new generation of our interconnect, which scales up to tens of thousands of nodes with linear cost, thus improving the price/performance ratio on large clusters. The project target is the development of the Apelink+ host adapter featuring a low latency, high bandwidth direct network, state-of-the-art wire speeds on the links and a PCIe X8 gen2 host interface. It features hardware support for the RDMA programming model and experimental acceleration of GPU networking. A Linux kernel driver, a set of low-level RDMA APIs and an OpenMPI library driver are available, allowing for painless porting of standard applications. Finally, we give an insight of future work and intended developments

arXiv.org e-Print Archive

ART